Interactive Hierarchical Clustering using Bayesian Nonparametrics
نویسندگان
چکیده
A widely-used class of algorithms to understand data is hierarchical clustering, but it is often difficult to reconcile the results of these algorithms with hierarchies constructed by humans. Interaction, or querying humans for constraints on the data, is a popular solution for addressing this discrepancy. In this paper, we propose using leave-one-out interactions to achieve better hierarchies by integrating the interactions into a Bayesian nonparametric posterior distribution over hierarchies.
منابع مشابه
Revisiting k-means: New Algorithms via Bayesian Nonparametrics
Bayesian models offer great flexibility for clustering applications—Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters across multiple data sets. For the most part, such flexibility is lacking in classical clustering methods such as k-means. In this paper, we revisit the k-means clustering algorithm from a Ba...
متن کاملMultiagent Planning with Bayesian Nonparametric Asymptotics
Autonomous multiagent systems are beginning to see use in complex, changing environments that cannot be completely specified a priori. In order to be adaptive to these environments and avoid the fragility associated with making too many a priori assumptions, autonomous systems must incorporate some form of learning. However, learning techniques themselves often require structural assumptions to...
متن کاملHierarchical Bayesian Nonparametric Models with Applications∗
Hierarchical modeling is a fundamental concept in Bayesian statistics. The basic idea is that parameters are endowed with distributions which may themselves introduce new parameters, and this construction recurses. A common motif in hierarchical modeling is that of the conditionally independent hierarchy, in which a set of parameters are coupled by making their distributions depend on a shared ...
متن کاملInteractive Bayesian Hierarchical Clustering
Clustering is a powerful tool in data analysis, but it is often difficult to find a grouping that aligns with a user’s needs. To address this, several methods incorporate constraints obtained from users into clustering algorithms, but unfortunately do not apply to hierarchical clustering. We design an interactive Bayesian algorithm that incorporates user interaction into hierarchical clustering...
متن کاملColouring and breaking sticks: random distributions and heterogeneous clustering
After a review of some of the implications for statistical modelling and analysis of probabilistic results about the Dirichet Process and its close relatives, we introduce a class of simple mixture models in which clusters are of different ‘colours’, with statistical characteristics that are constant within colours, but different between colours. Thus cluster identities are exchangeable only wi...
متن کامل